Working Efficiently with Large SAS® Datasets
نویسنده
چکیده
Many SAS users experience challenges when working with large SAS datasets having millions of rows, hundreds of columns and size close to a gigabyte or even more. Often it takes enormous time to process these datasets which can have an impact on delivery timelines. Also, storing such datasets may cause shortage of permanent storage space or can exhaust the available memory. In order to handle these constraints, one can think of making a large dataset smaller by reducing the number of observations and/or variables or by reducing the size of the variables without losing any of its information. This paper will focus on techniques supported by test results to reduce the size of huge datasets and work with them efficiently by striking a balance between the processing time and resource limitations. Some of the efficiency techniques described in this paper can also be applied while working with smaller datasets.
منابع مشابه
Datamapper: A Documentation Generator for SAS Metadata
SAS metadata is the data about SAS datasets, which is critical to the effective manipulation and analysis of SAS data. The SAS metadata exploration traditionally proceeds as ad-hoc programming on SAS Dictionary tables, which is inadequate, inefficient, and sometime complicated. Based on SAS ODBC functionality and hypertext techniques, a documentation generator called Datamapper is being develop...
متن کاملWhen PROC APPEND May Make More Sense Than the DATA STEP
Virtually all SAS programmers (with apologies to diehard SQL codeslingers) tend to use a simple SET statement in the DATA STEP when concatenating two or more datasets in their programs. Use of the SET statement is generally the most logical and practical approach because the required code is typically very succinct and the process in most instances involves concatenation of only two datasets. H...
متن کاملAdding the Power of DataFlux® to SAS® Programs Using the DQMATCH Function
The SAS Data Quality Server allows SAS programmers to integrate the power of DataFlux into their data cleaning programs. The power of SAS Data Quality Server enables programmers to efficiently identify matching records across different datasets when exact matches are not present. During a recent educational research project, the DQMATCH function proved very capable when trying to link records f...
متن کاملXML in the DATA Step
SAS Institute’s (SI) XMLMAP imports XML via the LIBNAME XML engine and a mapping file, by-passing the DATAstep. For export of XML, SI offers custom tagsets and the Output Delivery System (ODS). Custom tagsets are built with PROC TEMPLATE, by-passing the DATA-step. By contrast, the DATA-step method discussed in this paper uses a single, uniform methodology to import, export, and transform user-d...
متن کاملContrasting programming techniques for summarizing voluminous SAS output using the SAS Output Delivery System (ODS) (PROC FREQ as an example) Stuart Long, Westat
SAS® ODS provides programmers with the ability to extract selected information from a procedure and store it in datasets. Such datasets can then be combined to summarize the results from numerous procedures. The SAS Macro facility can be used to execute and extract information from repetitively called SAS procedures. “Macro Variable Arrays” can simplify the extraction of information from SAS pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010